Electronic Publishing of Digitised Works
نویسندگان
چکیده
This paper describes the automated process to create structured master and access copies for the digitised works at the BND – National Digital Library. The BND created during 2004 and 2005 nearly half million of digitised images, from more than 25.000 titles of printed works, manuscripts, drawings and maps. The resulting of the digitisation process is a group of TIFF image files representing the surfaces of the original works, which needs yet to be processed in order to be stored and published. Doing that manually would be a very complex and expensive task, with risks for the uniformity of the results, so it was need to develop an automated solution. To create the technical metadata, apply image processing actions and OCR, create derived copies for access in PNG, JPG, GIF, and PDF, we developed a tool named SECO. To create the master copies for each of those works, for preservation, and access copies in HTML, we developed a tool names CONTENTE, which exists as a standalone tool and as a library. Finally the copies are deposited and registered at the BND repository through the service PURL.PT, which assures also the WEB and intranet access control. This complex process is fully automated through several XML schemas for the control of the processes, description of the results (including the OCR outputs), descriptive metadata (in Dublin Core, MARC XML, etc.) and rights and structural metadata (in METS).
منابع مشابه
MathDoc and the Electronic Publishing of Mathematics
France has a long tradition in the publication of mathematics. The very first “mathematics only” journal in the world, the Annales de Gergonne was published from 1810 to 1831 and several of the foremost current mathematical journals are published in France today. This paper will develop the original work of the “MathDoc” team to make accessible these and other journals, and more generally to pr...
متن کاملThe Internet Library of Early Journals: An Electronic Library of Primary Sources on the Internet
Introduction The Internet Library of Early Journals (ILEJ) owes its existence to the UK Higher Education Libraries Review, chaired by Sir Brian Follett, the report of which appeared in 1993. The Follett Report, as it is generally called1 , discussed the implications of information technology in enhancing the work of libraries and proposed a programme of development in key areas of IT within t...
متن کاملDual Spacization Approach to the Electronic Publishing
Dual spacization of publishing means emergence of digital publishing in online and offline virtual environments along with analogue publishing. Analogue publishing is a kind of publishing that is produced in the form of physical printed writings as they appear in a single paper, single or many pages newspapers and magazines and books, writings on leaves and pieces of trees, natural skin and lea...
متن کاملOCR Alternatives for Electronic Publishing of Digitised Documents
This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representa...
متن کاملHow Dynamic E-journals can Interconnect Open Access Archives
Influential scientists are urging journal publishers to free their published works so they can be accessed in comprehensive digital archives. That would create the opportunity for new services that dynamically interconnect material in the archives. To achieve this, two issues endemic to scholarly journal publishing need to be tackled: decoupling journal content from publishing process; defragme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006